AITopics | molotov cocktail

Collaborating Authors

molotov cocktail

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How a fiery attack on Sam Altman's home unfolded

The GuardianApr-18-2026, 14:00:15 GMT

Sam Altman speaks during the BlackRock infrastructure summit on 11 March in Washington DC. Sam Altman speaks during the BlackRock infrastructure summit on 11 March in Washington DC. How a fiery attack on Sam Altman's home unfolded Molotov cocktail attack on OpenAI CEO's home comes amid growing discontent against artificial intelligence I n the early hours of 10 April, a man approached the gate of OpenAI CEO Sam Altman's house in San Francisco and hurled a molotov cocktail at the building before fleeing. Federal and California state authorities have charged Moreno-Gama with a range of crimes including attempted arson and attempted murder. His parents issued a statement this week saying that their son had recently suffered a mental health crisis.

large language model, machine learning, natural language, (16 more...)

The Guardian

Country:

North America > United States > District of Columbia > Washington (0.45)
North America > United States > California > San Francisco County > San Francisco (0.38)
Europe > Ukraine (0.06)
(4 more...)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (0.48)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.36)

Technology:

Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
(2 more...)

Add feedback

A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

Bullwinkel, Blake, Russinovich, Mark, Salem, Ahmed, Zanella-Beguelin, Santiago, Jones, Daniel, Severi, Giorgio, Kim, Eugenia, Hines, Keegan, Minnich, Amanda, Zunger, Yonatan, Kumar, Ram Shankar Siva

arXiv.org Artificial IntelligenceJul-8-2025

Recent research has demonstrated that state-of-the-art LLMs and defenses remain susceptible to multi-turn jailbreak attacks. These attacks require only closed-box model access and are often easy to perform manually, posing a significant threat to the safe and secure deployment of LLM-based systems. We study the effectiveness of the Crescendo multi-turn jailbreak at the level of intermediate model representations and find that safety-aligned LMs often represent Crescendo responses as more benign than harmful, especially as the number of conversation turns increases. Our analysis indicates that at each turn, Crescendo prompts tend to keep model outputs in a "benign" region of representation space, effectively tricking the model into fulfilling harmful requests. Further, our results help explain why single-turn jailbreak defenses like circuit breakers are generally ineffective against multi-turn attacks, motivating the development of mitigations that address this generalization gap.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.02956

Country:

North America > United States (0.45)
Asia > Russia (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Safeguarding AI Agents: Developing and Analyzing Safety Architectures

Domkundwar, Ishaan, S, Mukunda N, Bhola, Ishaan

arXiv.org Artificial IntelligenceSep-13-2024

AI agents, specifically powered by large language models, have demonstrated exceptional capabilities in various applications where precision and efficacy are necessary. However, these agents come with inherent risks, including the potential for unsafe or biased actions, vulnerability to adversarial attacks, lack of transparency, and tendency to generate hallucinations. As AI agents become more prevalent in critical sectors of the industry, the implementation of effective safety protocols becomes increasingly important. This paper addresses the critical need for safety measures in AI systems, especially ones that collaborate with human teams. We propose and evaluate three frameworks to enhance safety protocols in AI agent systems: an LLM-powered input-output filter, a safety agent integrated within the system, and a hierarchical delegation-based system with embedded safety checks. Our methodology involves implementing these frameworks and testing them against a set of unsafe agentic use cases, providing a comprehensive evaluation of their effectiveness in mitigating risks associated with AI agent deployment. We conclude that these frameworks can significantly strengthen the safety and security of AI agent systems, minimizing potential harmful actions or outputs. Our work contributes to the ongoing effort to create safe and reliable AI applications, particularly in automated operations, and provides a foundation for developing robust guardrails to ensure the responsible use of AI agents in real-world applications.

agent, agent system, safety, (15 more...)

arXiv.org Artificial Intelligence

2409.03793

Country:

North America > United States (0.04)
Europe > Russia (0.04)
Europe > Finland (0.04)
(3 more...)

Genre: Instructional Material > Online (0.46)

Industry:

Education (1.00)
Health & Medicine (0.94)
Law (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Does Refusal Training in LLMs Generalize to the Past Tense?

Andriushchenko, Maksym, Flammarion, Nicolas

arXiv.org Artificial IntelligenceJul-16-2024

Refusal training is widely used to prevent LLMs from generating harmful, undesirable, or illegal outputs. We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?" to "How did people make a Molotov cocktail?") is often sufficient to jailbreak many state-of-the-art LLMs. We systematically evaluate this method on Llama-3 8B, GPT-3.5 Turbo, Gemma-2 9B, Phi-3-Mini, GPT-4o, and R2D2 models using GPT-3.5 Turbo as a reformulation model. For example, the success rate of this simple attack on GPT-4o increases from 1% using direct requests to 88% using 20 past tense reformulation attempts on harmful requests from JailbreakBench with GPT-4 as a jailbreak judge. Interestingly, we also find that reformulations in the future tense are less effective, suggesting that refusal guardrails tend to consider past historical questions more benign than hypothetical future questions. Moreover, our experiments on fine-tuning GPT-3.5 Turbo show that defending against past reformulations is feasible when past tense examples are explicitly included in the fine-tuning data. Overall, our findings highlight that the widely used alignment techniques -- such as SFT, RLHF, and adversarial training -- employed to align the studied models can be brittle and do not always generalize as intended. We provide code and jailbreak artifacts at https://github.com/tml-epfl/llm-past-tense.

arxiv preprint arxiv, harmful request, reformulation, (15 more...)

arXiv.org Artificial Intelligence

2407.11969

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SelfIE: Self-Interpretation of Large Language Model Embeddings

Chen, Haozhe, Vondrick, Carl, Mao, Chengzhi

arXiv.org Artificial IntelligenceMar-25-2024

How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.

interpretation, self-interpretation, selfie, (14 more...)

arXiv.org Artificial Intelligence

2403.10949

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > Scotland (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

2 charged with hate crimes after black family's home is hit by Molotov cocktails and racist graffiti

Los Angeles TimesSep-14-2016, 22:20:57 GMT

In a crime that shocked a California Delta community, a man and woman were charged with hate crimes Tuesday in connection with launching Molotov cocktails into the home of a black family in Antioch and spray-painting the residence with a swastika and racial slurs, police said. Roy Charles Sorvari, 27, of Antioch and Christyne Gail McDaniel, 25, of Brentwood face charges of arson and conspiracy to commit murder, mayhem, torture and assault with a deadly weapon, according to the Antioch Police Department. Sorvari and McDaniel were also charged with hate crime enhancements. They have each been ordered held on more than 1 million bail. The attack "sent shockwaves in the city of Antioch," Police Chief Allan Cantando said at news conference Tuesday.

artificial intelligence, molotov cocktail, molotov cocktail and racist graffiti, (12 more...)

Los Angeles Times

Country:

North America > United States > California (0.28)
North America > United States > North Carolina > Buncombe County > Asheville (0.08)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.41)

Add feedback

Man Arrested for Throwing Molotov Cocktails at Google Street View Car

TIME - TechJul-6-2016, 16:16:18 GMT

A man has been charged with felony arson after authorities say he threw Molotov cocktails at a Google car parked outside a company building in Mountain View, Calif. Raul Murillo Diaz, 30, threw several beer bottles turned into Molotov cocktails at a Google Street View car parked outside Google's building, prosecutors said, according to NBC Bay Area. While one of the bottles caused a fire, the car did not explode. Diaz told law enforcement he "felt Google was watching him and it made him upset," according to an affidavit. Diaz also told authorities he was involved in two other incidents related to Google, including burning one of Google's self-driving cars, but he has only been charged with one count of arson so far.

artificial intelligence, google street view car, molotov cocktail, (1 more...)

TIME - Tech

Country: North America > United States > California > Santa Clara County > Mountain View (0.30)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Transportation > Ground > Road (0.66)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

Add feedback